63 research outputs found
Constructing Sample-to-Class Graph for Few-Shot Class-Incremental Learning
Few-shot class-incremental learning (FSCIL) aims to build machine learning
model that can continually learn new concepts from a few data samples, without
forgetting knowledge of old classes.
The challenges of FSCIL lies in the limited data of new classes, which not
only lead to significant overfitting issues but also exacerbates the notorious
catastrophic forgetting problems. As proved in early studies, building sample
relationships is beneficial for learning from few-shot samples. In this paper,
we promote the idea to the incremental scenario, and propose a Sample-to-Class
(S2C) graph learning method for FSCIL.
Specifically, we propose a Sample-level Graph Network (SGN) that focuses on
analyzing sample relationships within a single session. This network helps
aggregate similar samples, ultimately leading to the extraction of more refined
class-level features.
Then, we present a Class-level Graph Network (CGN) that establishes
connections across class-level features of both new and old classes. This
network plays a crucial role in linking the knowledge between different
sessions and helps improve overall learning in the FSCIL scenario. Moreover, we
design a multi-stage strategy for training S2C model, which mitigates the
training challenges posed by limited data in the incremental process.
The multi-stage training strategy is designed to build S2C graph from base to
few-shot stages, and improve the capacity via an extra pseudo-incremental
stage. Experiments on three popular benchmark datasets show that our method
clearly outperforms the baselines and sets new state-of-the-art results in
FSCIL
Text to realistic image generation with attentional concatenation generative adversarial networks.
In this paper, we propose an Attentional Concatenation Generative Adversarial Network (ACGAN) aiming at generating 1024 × 1024 high-resolution images. First, we propose a multilevel cascade structure, for text-to-image synthesis. During training progress, we gradually add new layers and, at the same time, use the results and word vectors from the previous layer as inputs to the next layer to generate high-resolution images with photo-realistic details. Second, the deep attentional multimodal similarity model is introduced into the network, and we match word vectors with images in a common semantic space to compute a fine-grained matching loss for training the generator. In this way, we can pay attention to the fine-grained information of the word level in the semantics. Finally, the measure of diversity is added to the discriminator, which enables the generator to obtain more diverse gradient directions and improve the diversity of generated samples. The experimental results show that the inception scores of the proposed model on the CUB and Oxford-102 datasets have reached 4.48 and 4.16, improved by 2.75% and 6.42% compared to Attentional Generative Adversarial Networks (AttenGAN). The ACGAN model has a better effect on text-generated images, and the resulting image is closer to the real image
Dynamic V2X Autonomous Perception from Road-to-Vehicle Vision
Vehicle-to-everything (V2X) perception is an innovative technology that
enhances vehicle perception accuracy, thereby elevating the security and
reliability of autonomous systems. However, existing V2X perception methods
focus on static scenes from mainly vehicle-based vision, which is constrained
by sensor capabilities and communication loads. To adapt V2X perception models
to dynamic scenes, we propose to build V2X perception from road-to-vehicle
vision and present Adaptive Road-to-Vehicle Perception (AR2VP) method. In
AR2VP,we leverage roadside units to offer stable, wide-range sensing
capabilities and serve as communication hubs. AR2VP is devised to tackle both
intra-scene and inter-scene changes. For the former, we construct a dynamic
perception representing module, which efficiently integrates vehicle
perceptions, enabling vehicles to capture a more comprehensive range of dynamic
factors within the scene.Moreover, we introduce a road-to-vehicle perception
compensating module, aimed at preserving the maximized roadside unit perception
information in the presence of intra-scene changes.For inter-scene changes, we
implement an experience replay mechanism leveraging the roadside unit's storage
capacity to retain a subset of historical scene data, maintaining model
robustness in response to inter-scene shifts. We conduct perception experiment
on 3D object detection and segmentation, and the results show that AR2VP excels
in both performance-bandwidth trade-offs and adaptability within dynamic
environments
Multi-Label Continual Learning using Augmented Graph Convolutional Network
Multi-Label Continual Learning (MLCL) builds a class-incremental framework in
a sequential multi-label image recognition data stream. The critical challenges
of MLCL are the construction of label relationships on past-missing and
future-missing partial labels of training data and the catastrophic forgetting
on old classes, resulting in poor generalization. To solve the problems, the
study proposes an Augmented Graph Convolutional Network (AGCN++) that can
construct the cross-task label relationships in MLCL and sustain catastrophic
forgetting. First, we build an Augmented Correlation Matrix (ACM) across all
seen classes, where the intra-task relationships derive from the hard label
statistics. In contrast, the inter-task relationships leverage hard and soft
labels from data and a constructed expert network. Then, we propose a novel
partial label encoder (PLE) for MLCL, which can extract dynamic class
representation for each partial label image as graph nodes and help generate
soft labels to create a more convincing ACM and suppress forgetting. Last, to
suppress the forgetting of label dependencies across old tasks, we propose a
relationship-preserving constrainter to construct label relationships. The
inter-class topology can be augmented automatically, which also yields
effective class representations. The proposed method is evaluated using two
multi-label image benchmarks. The experimental results show that the proposed
way is effective for MLCL image recognition and can build convincing
correlations across tasks even if the labels of previous tasks are missing
- …